Tree-Structured Classifiers

نویسنده

  • Wei-Yin Loh
چکیده

A tree-structured classifier is a decision tree for predicting a class variable from one or more predictor variables. THAID [15, 7] was the first such algorithm. This article focuses on the CART R © [2], C4.5 [17], and GUIDE [12] methods. The algorithms are briefly reviewed and their similarities and differences compared on a real data set and by simulation. In a typical classification problem, we have a training sample L = {(X1, Y1), (X2, Y2), . . . , (XN , YN )} of N observations, where each X = (X1, . . . , XK) is a K-dimensional vector of predictor variables and Y is a class variable that takes one of J values. We want to construct a rule for predicting the Y value of a new observation given its value of X. If the predictor variables are all ordered, i.e., non-categorical, some popular classifiers are linear discriminant analysis (LDA), nearest neighbor, and support vector machines. (Categorical predictor variables can be accommodated by transformation to vectors of 0-1 dummy variables.) Although these classifiers often possess good prediction accuracy, they act like black boxes and do not provide much insight into the roles of the predictor variables. A tree-structured classifier (or classification tree) is an attractive alternative because it is easy to interpret. It is a decision tree obtained by recursive partitioning of the X-space. An observation in a partition is predicted to belong to the class with minimum estimated misclassification cost. Classification trees have been demonstrated to possess high prediction accuracy compared to many other methods; see, e.g., Lim et al. [11], Perlich et al. [16], and Loh [12]. They do not require categorical predictor variables to be transformed. THAID [15, 7] is the first published algorithm. We review here the CART R © [2], C4.5 [17], and GUIDE [12] algorithms and illustrate their similarities and differences on a real data set and by simulation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tree Kernel Usage in Naive Bayes Classifiers

We present a novel approach in machine learning by combining naı̈ve Bayes classifiers with tree kernels. Tree kernel methods produce promising results in machine learning tasks containing treestructured attribute values. These kernel methods are used to compare two tree-structured attribute values recursively. Up to now tree kernels are only used in kernel machines like Support Vector Machines o...

متن کامل

Multi-View Forest: A New Ensemble Method based on Dempster-Shafer Evidence Theory

This paper proposes a new ensemble method that constructs an ensemble of tree-structured classifiers using multi-view learning. We are motivated by the fact that an ensemble can outperform its members providing that these classifiers are diverse and accurate. In order to construct diverse individual classifiers, we assume that the object to be classified is described by multiple feature sets (v...

متن کامل

Multi-View Forests of Tree-Structured Radial Basis Function Networks Based on Dempster-Shafer Evidence Theory

An essential requirement to create an accurate classifier ensemble is the diversity among the individual base classifiers. In this paper, Multi-View Forests, a method to construct ensembles of tree-structured radial basis function (RBF) networks using multi-view learning is proposed. In Multi-view learning it is assumed that the patterns to be classified are described by multiple feature sets (...

متن کامل

Title of Dissertation : CLASSIFICATION AND COMPRESSION OF MULTI - RESOLUTION VECTORS : A TREE STRUCTURED VECTOR QUANTIZER APPROACH

Title of Dissertation: CLASSIFICATION AND COMPRESSION OF MULTIRESOLUTION VECTORS: A TREE STRUCTURED VECTOR QUANTIZER APPROACH Sudhir Varma, Doctor of Philosophy, 2002 Dissertation directed by: Professor John S. Baras Department of Electrical and Computer Engineering Tree structured classifiers and quantizers have been used with good success for problems ranging from successive refinement coding...

متن کامل

Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

Title of Dissertation: CLASSIFICATION AND COMPRESSION OF MULTIRESOLUTION VECTORS: A TREE STRUCTURED VECTOR QUANTIZER APPROACH Sudhir Varma, Doctor of Philosophy, 2002 Dissertation directed by: Professor John S. Baras Department of Electrical and Computer Engineering Tree structured classifiers and quantizers have been used with good success for problems ranging from successive refinement coding...

متن کامل

A Mixtures-of-Experts Framework for Multi-Label Classification

We develop a novel probabilistic approach for multi-label classification that is based on the mixtures-of-experts architecture combined with recently introduced conditional tree-structured Bayesian networks. Our approach captures different input-output relations from multi-label data using the efficient tree-structured classifiers, while the mixtures-of-experts architecture aims to compensate f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010